translated by 谷歌翻译
Models trained from real-world data tend to imitate and amplify social biases. Although there are many methods suggested to mitigate biases, they require a preliminary information on the types of biases that should be mitigated (e.g., gender or racial bias) and the social groups associated with each data sample. In this work, we propose a debiasing method that operates without any prior knowledge of the demographics in the dataset, detecting biased examples based on an auxiliary model that predicts the main model's success and down-weights them during the training process. Results on racial and gender bias demonstrate that it is possible to mitigate social biases without having to use a costly demographic annotation process.
translated by 谷歌翻译
We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.
translated by 谷歌翻译
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss and achieve a better efficiency-accuracy Pareto frontier than hand-tuned and heuristic baselines.
translated by 谷歌翻译
In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions. First, we observe that neuron alignment methods alone are insufficient to establish low-barrier linear connectivity between SGD solutions due to a phenomenon we call variance collapse: interpolated deep networks suffer a collapse in the variance of their activations, causing poor performance. Next, we propose REPAIR (REnormalizing Permuted Activations for Interpolation Repair) which mitigates variance collapse by rescaling the preactivations of such interpolated networks. We explore the interaction between our method and the choice of normalization layer, network width, and depth, and demonstrate that using REPAIR on top of neuron alignment methods leads to 60%-100% relative barrier reduction across a wide variety of architecture families and tasks. In particular, we report a 74% barrier reduction for ResNet50 on ImageNet and 90% barrier reduction for ResNet18 on CIFAR10.
translated by 谷歌翻译
translated by 谷歌翻译
在许多情况下,有必要通过观察时间序列监视复杂的系统,并确定何时发生异源事件,以便采取相关的动作。确定当前的观察是否异常是具有挑战性的。它需要从历史数据中学习动力学的外推性概率模型,并使用有限数量的当前观察结果来进行分类。我们利用长期概率预测的最新进展,即{\ em Deep概率Koopman},构建了一种在多维时序数据中对异常进行分类的通用方法。我们还展示了如何利用具有域知识的模型来减少I型和II型错误。我们展示了我们提出的关于全球大气污染监测的重要现实世界任务的方法,并将其与NASA的全球地球系统模型集成在一起。该系统成功地检测到由于COVID-19锁定和野火等事件而导致的空气质量异常情况。
translated by 谷歌翻译
translated by 谷歌翻译
在这项工作中,我们提出了一个完全可区分的图形神经网络(GNN)的架构,用于用于通道解码和展示各种编码方案的竞争性解码性能,例如低密度奇偶校验检查(LDPC)和BCH代码。这个想法是让神经网络(NN)通过给定图的通用消息传递算法,该算法通过用可训练的函数替换节点和边缘消息更新来代表正向误差校正(FEC)代码结构。与许多其他基于深度学习的解码方法相反,提出的解决方案享有对任意块长度的可扩展性,并且训练不受维数的诅咒的限制。我们在常规渠道解码中对最新的解码以及最近的基于深度学习的结果基准了我们提出的解码器。对于(63,45)BCH代码,我们的解决方案优于加权信念传播(BP)的解码约0.4 dB,而解码迭代率明显较小,甚至对于5G NR LDPC代码,我们观察到与常规BP解码相比,我们观察到竞争性能。对于BCH代码,所得的GNN解码器只能以9640个权重进行完全参数。
translated by 谷歌翻译
自Covid-19大流行病开始以来,疫苗一直是公共话语中的重要话题。疫苗周围的讨论被两极分化,因为有些人认为它们是结束大流行的重要措施,而另一些人则犹豫不决或发现它们有害。这项研究调查了与Twitter上的Covid-19疫苗有关的帖子,并着重于对疫苗有负姿态的帖子。收集了与COVID-19疫苗相关的16,713,238个英文推文的数据集,收集了涵盖从2020年3月1日至2021年7月31日的该期间。我们使用Scikit-Learn Python库来应用支持向量机(SVM)分类器针对Covid-19疫苗的推文具有负姿态。总共使用了5,163个推文来训练分类器,其中有2,484个推文由我们手动注释并公开提供。我们使用Berttopic模型来提取和调查负推文中讨论的主题以及它们如何随时间变化。我们表明,随着疫苗的推出,对COVID-19疫苗的负面影响随时间而下降。我们确定了37个讨论主题,并随着时间的推移介绍了各自的重要性。我们表明,流行的主题包括阴谋讨论,例如5G塔和微芯片,但还涉及涉及疫苗接种安全性和副作用以及对政策的担忧。我们的研究表明,即使是不受欢迎的观点或阴谋论,与广受欢迎的讨论主题(例如Covid-19疫苗)配对时,也会变得广泛。了解问题和讨论的主题以及它们如何随着时间的变化对于政策制定者和公共卫生当局提供更好和时间的信息和政策,以促进未来类似危机的人口接种。
translated by 谷歌翻译